#TidyTuesday - Powerlifting
Powerlifting is a strength sport and weight training exercise that consists of three attempts at maximal weight on the following three lifts:
1.Squat
2.Bench press
3.Deadlift.
Powerlifting is becoming a popular sport now-a-days. The idea behind powerlifting is not focused on gaining muscle but rather building strength to lift as much weight as possible.
#About the Data
We are exploring Powerlifting Dataset focusing on international powerlifting compitations. This dataset was published in #TidyTuesday - R4DS Online Learning Community on 10-8-2019.
https://github.com/rfordatascience/tidytuesday
The source data is available at : https://openpowerlifting.org/data
The dataset contains 41,152 observations and 16 variables.Only IPF federation records are analyzed.
Each lifter’s efforts in different events are represented by a single row.
#MissingData
We choose some key variables and plot a simple summary of missingness.
# A tibble: 7 x 3
variable n_miss pct_miss
<chr> <int> <dbl>
1 age 2906 7.06
2 best3bench_kg 2462 5.98
3 best3deadlift_kg 14028 34.1
4 best3squat_kg 13698 33.3
5 bodyweight_kg 187 0.454
6 date 0 0
7 sex 0 0
This is the animated dumbbell plots of the difference between the best male & female powerlifters at IPF events, Using #gganimate (and faceting w/ {magick}).
github link: https://connorrothschild.github.io/tidytuesday/2019-10-08/index
This static visualization shows the heaviest lifts from each year for both men and women categories.
The dumbell plot is matching with the theme of the dataset very well
It’s also a new kind of plot that we have not seen before
But it is lacking some sense of time and user interactiveness
This plot added the time dimension to the plot by displaying the visual around in a time series
This is definitely a great improvement from the previous one
But this graph do not append the yearly data and hence the user might immediately forget the previous year’s values
Hence the user will not able to compare the change over the years and will not be able to see the trend
This graph resolves the issue where it keeps on appending the previous year’s data
The user is able to follow the change now and also able to visualize the trend
But this graph gives a little bit different visual of the data
Placing Animations side-by-side with magick.
Showing both the graphs together is a huge improvement as it lets the user see the actual values, the change and also the trend over years
But there can be more ways to find out insights from the data and look at the distribution and hidden trends
Also the visuals can be made interactive to engage the user more with the data, hence come our visuals and ideas
This visual shows the distribution of weights lifted by different age groups with the quartiles separated by color (Viridis)
The third quartile (Green) contains 50 to 75 precentile points and shows that people of ages between 24 to 34 lift the highest weights
Beyond that age group, the capability of lifting weights decreases gradually
This animated vizualization shows how men and women differ in Best of 3 deadlifts by decades.
-Male lifters are able to lift higher maximal weights than women.
-In the inital decade there is no data available for women participants.
-There is no trend shown in maximum weights lifted over the time in men or women.
-In the Recent decades there is an increase in data available.
Our GitHub repo: https://github.com/RamyaPrakashPT/DataVisualization-DesignContest
Our Rpubs link: http://rpubs.com/soumyadipmitra/designcontest
Our BlogPost links:
---
title: "Design Contest"
output:
flexdashboard::flex_dashboard:
source_code: embed
theme: spacelab
social: [ "twitter", "facebook", "menu" ]
---
```{r include=FALSE}
library(flexdashboard)
library(ggplot2)
library(tidyverse)
library(tidyr)
library(lubridate)
```
```{r load-data}
ipf_lifts <- read_csv("data/ipf_lifts.csv")
```
```{r clean_data-01}
ipf_lifts1 <- ipf_lifts %>%
mutate(year = lubridate::year(date))
ipf_lifts_reshape <- ipf_lifts1 %>%
tidyr::pivot_longer(cols = c("best3squat_kg", "best3bench_kg", "best3deadlift_kg"), names_to = "lift") %>%
select(name, sex, year, lift, value)
```
```{r clean_data-02}
ipf_lifts_maxes <- ipf_lifts_reshape %>%
group_by(year, sex, lift) %>%
top_n(1, value) %>%
ungroup %>%
distinct(year, lift, value, .keep_all = TRUE)
```
```{r clean_data-03}
max_pivot <- ipf_lifts_maxes %>%
spread(sex, value)
```
```{r clean_data-04}
male_lifts <- max_pivot %>%
select(-name) %>%
filter(!is.na(M)) %>%
group_by(year, lift) %>%
summarise(male = mean(M))
female_lifts <- max_pivot %>%
select(-name) %>%
filter(!is.na(`F`)) %>%
group_by(year, lift) %>%
summarise(female = mean(`F`))
max_lifts <- merge(male_lifts, female_lifts)
max_lifts_final <- max_lifts %>%
group_by(year, lift) %>%
mutate(diff = male - female)
```
Introduction {.storyboard}
=========================================
### #tidytuesday-POWERLIFTING
```{r intro-01,fig.height=5, fig.width=8}
library("png")
pp <- readPNG("img/Introduction.PNG")
plot.new()
rasterImage(pp,0,0,1,1)
```
***
#TidyTuesday - Powerlifting
Powerlifting is a strength sport and weight training exercise that consists of three attempts at maximal weight on the following three lifts:
1.Squat
2.Bench press
3.Deadlift.
Powerlifting is becoming a popular sport now-a-days. The idea behind powerlifting is not focused on gaining muscle but rather building strength to lift as much weight as possible.
### DATA DESCRIPTION
```{r basic-distribution, fig.height=6, fig.width=12, echo=FALSE}
p_distribution <- ipf_lifts %>%
select(starts_with("best3")) %>%
rename(Squat = "best3squat_kg",
Bench = "best3bench_kg",
Deadlift = "best3deadlift_kg") %>%
pivot_longer(cols = everything(), names_to = "type", values_to = "weight") %>%
filter(!is.na(weight) & weight > 0) %>%
ggplot(aes(x = weight)) +
geom_histogram(binwidth = 10) +
facet_wrap(facets = ~ type) +
labs(title = "Distribution of maximum weights lifted",
subtitle = "",
x = "Maximum weight lifted (kgs)",
y = "Count") +
theme(axis.ticks = element_blank())
p_distribution
```
***
#About the Data
We are exploring Powerlifting Dataset focusing on international powerlifting compitations. This dataset was published in #TidyTuesday - R4DS Online Learning Community on 10-8-2019.
https://www.tidytuesday.com/
https://github.com/rfordatascience/tidytuesday
The source data is available at : https://openpowerlifting.org/data
The dataset contains 41,152 observations and 16 variables.Only IPF federation records are analyzed.
Each lifter’s efforts in different events are represented by a single row.
#MissingData
We choose some key variables and plot a simple summary of missingness.
```{r missing-data, echo=FALSE}
library(naniar)
p_missing_data <- ipf_lifts %>%
select(sex, age, bodyweight_kg, starts_with("best3"), date) %>%
naniar::miss_var_summary()%>%
arrange(variable)
p_missing_data
```
Replication {.storyboard}
=========================================
### Replicating Connor Rothschild's #tidyTuesday submission for Powerlifting
```{r viz-01}
#install.packages("devtools")
#devtools::install_github("clauswilke/ggtext")
#devtools::install_github("connorrothschild/tpltheme")
library(tpltheme)
#install.packages("ggalt")
library(ggtext)
max_lifts_final %>%
filter(year == 2019) %>%
ggplot() +
ggalt::geom_dumbbell(aes(y = lift,
x = female, xend = male),
colour = "grey", size = 5,
colour_x = "#D6604C", colour_xend = "#395B74") +
labs(y = element_blank(),
x = "Top Lift Recorded (kg)",
title = "How Women and Men Differ in Top Lifts",
subtitle = "In 2019") +
theme(plot.title = element_markdown(lineheight = 1.1, size = 20),
plot.subtitle = element_text(size = 15)) +
scale_y_discrete(labels = c("Bench", "Deadlift", "Squat")) +
drop_axis(axis = "y") +
geom_text(aes(x = female, y = lift, label = paste(female, "kg")),
color = "#D6604C", size = 4, vjust = -2) +
geom_text(aes(x = male, y = lift, label = paste(male, "kg")),
color = "#395B74", size = 4, vjust = -2) +
geom_rect(aes(xmin=430, xmax=470, ymin=-Inf, ymax=Inf), fill="grey80") +
geom_text(aes(label=diff, y=lift, x=450), fontface="bold", size=4) +
geom_text(aes(x=450, y=3, label="Difference"),
color="grey20", size=4, vjust=-3, fontface="bold")
```
***
This is the animated dumbbell plots of the difference between the best male & female powerlifters at IPF events, Using #gganimate (and faceting w/ {magick}).
github link: https://connorrothschild.github.io/tidytuesday/2019-10-08/index
This static visualization shows the heaviest lifts from each year for both men and women categories.
- The dumbell plot is matching with the theme of the dataset very well
- It's also a new kind of plot that we have not seen before
- But it is lacking some sense of time and user interactiveness
### Animating the Top lifts visualization
```{r viz-animation-01}
#install.packages('gganimate')
#install.packages("gifski")
library(gganimate)
library(gifski)
animation <- max_lifts_final %>%
ggplot() +
ggalt::geom_dumbbell(aes(y = lift,
x = female, xend = male),
colour = "grey", size = 5,
colour_x = "#D6604C", colour_xend = "#395B74") +
labs(y = element_blank(),
x = "Top Lift Recorded (kg)",
title = "How Women and Men Differ in Top Lifts",
subtitle='\nThis plot depicts the difference between the heaviest lifts for each sex at International Powerlifting Federation\nevents over time. \n \n{closest_state}') +
theme(plot.title = element_markdown(lineheight = 1.1, size = 25, margin=margin(0,0,0,0)),
plot.subtitle = element_text(size = 15, margin=margin(8,0,-30,0))) +
scale_y_discrete(labels = c("Bench", "Deadlift", "Squat")) +
drop_axis(axis = "y") +
geom_text(aes(x = female, y = lift, label = paste(female, "kg")),
color = "#D6604C", size = 4, vjust = -2) +
geom_text(aes(x = male, y = lift, label = paste(male, "kg")),
color = "#395B74", size = 4, vjust = -2) +
transition_states(year, transition_length = 4, state_length = 1) +
ease_aes('cubic-in-out')
a_gif <- animate(animation,
fps = 10,
duration = 25,
width = 800, height = 400,
renderer = gifski_renderer("./heavy_lifts_each_sex.gif"))
a_gif
```
***
- This plot added the time dimension to the plot by displaying the visual around in a time series
- This is definitely a great improvement from the previous one
- But this graph do not append the yearly data and hence the user might immediately forget the previous year's values
- Hence the user will not able to compare the change over the years and will not be able to see the trend
### A line chart of differences over time
```{r}
animation2 <- max_lifts_final %>%
ungroup %>%
mutate(lift = case_when(lift == "best3bench_kg" ~ "Bench",
lift == "best3squat_kg" ~ "Squat",
lift == "best3deadlift_kg" ~ "Deadlift")) %>%
ggplot(aes(year, diff, group = lift, color = lift)) +
geom_line(show.legend = FALSE) +
geom_segment(aes(xend = 2019.1, yend = diff), linetype = 2, colour = 'grey', show.legend = FALSE) +
geom_point(size = 2, show.legend = FALSE) +
geom_text(aes(x = 2019.1, label = lift, color = "#000000"), hjust = 0, show.legend = FALSE) +
drop_axis(axis = "y") +
transition_reveal(year) +
coord_cartesian(clip = 'off') +
theme(plot.title = element_text(size = 20)) +
labs(title = 'Difference over time',
y = 'Difference (kg)',
x = element_blank()) +
theme(plot.margin = margin(5.5, 40, 5.5, 5.5))
b_gif <- animate(animation2,
fps = 10,
duration = 25,
width = 800, height = 200,
renderer = gifski_renderer("./difference_over_time.gif"))
b_gif
```
***
- This graph resolves the issue where it keeps on appending the previous year's data
- The user is able to follow the change now and also able to visualize the trend
- But this graph gives a little bit different visual of the data
### Animation Composition
```{r viz-animation-03}
#install.packages("magick")
library(magick)
a_mgif <- image_read(a_gif)
b_mgif <- image_read(b_gif)
new_gif <- image_append(c(a_mgif[1], b_mgif[1]), stack = TRUE)
for(i in 2:250){
combined <- image_append(c(a_mgif[i], b_mgif[i]), stack = TRUE)
new_gif <- c(new_gif, combined)
}
new_gif
```
***
Placing Animations side-by-side with magick.
- Showing both the graphs together is a huge improvement as it lets the user see the actual values, the change and also the trend over years
- But there can be more ways to find out insights from the data and look at the distribution and hidden trends
- Also the visuals can be made interactive to engage the user more with the data, hence come our visuals and ideas
Our Visuals {.storyboard}
=========================================
### Visual 1
```{r Soumyadip Visual, fig.height=5, fig.width=10}
library(ggridges)
library(scales)
library(viridis)
library(forcats)
ipf_lifts %>%
mutate(year = lubridate::year(date)) %>%
filter(year == 1994) %>%
mutate(age_class = fct_rev(as.factor(age_class))) %>%
filter(age_class != '5-12') %>%
filter(age_class != '80-999') %>%
drop_na(c(age_class,best3squat_kg)) %>%
ggplot(aes(x=best3squat_kg,y=age_class,fill=factor(..quantile..))) +
stat_density_ridges(geom="density_ridges_gradient",calc_ecdf = TRUE,
quantiles = 4,quantile_lines = TRUE,
na.rm = TRUE) +
scale_fill_viridis(discrete = TRUE,name="Quartiles") +
scale_x_continuous(limits = c(0,510),expand = c(0,0),labels=unit_format(unit="Kg")) +
theme_bw() +
labs(
title = "Distribution of Weight lifted in Squat for different Age Groups",
x = "",
y = "Age Classification"
)
```
***
- This visual shows the distribution of weights lifted by different age groups with the quartiles separated by color (Viridis)
- The third quartile (Green) contains 50 to 75 precentile points and shows that people of ages between 24 to 34 lift the highest weights
- Beyond that age group, the capability of lifting weights decreases gradually
### Visual 2
```{r RamyaPrakash Visuals}
library(gganimate)
library(gifski)
library(ggridges)
library(scales)
library(viridis)
ipf_lifts_year <- ipf_lifts %>%
mutate(year = format(date, "%Y")) %>%
mutate(decade = year(date) - (year(date) %% 10))
ipf_lifts_decade<- ggplot(data=ipf_lifts_year, mapping = aes(y=decade, x=best3deadlift_kg, fill=sex)) +
geom_density_ridges() +
labs(x = "Weight (kg)",y = "Decade",title = "How men and women differ in Best of 3 Deadlifts \n by decades") +
scale_fill_viridis(discrete = TRUE,name="Quartiles") +
scale_x_continuous(limits = c(10,500))+
theme_ridges()+
transition_manual(year)
my_gif <- animate(ipf_lifts_decade,
fps = 5,
duration = 5,
renderer = gifski_renderer("./ipf_lifts_decade.gif"))
my_gif
```
***
This animated vizualization shows how men and women differ in Best of 3 deadlifts by decades.
-Male lifters are able to lift higher maximal weights than women.
-In the inital decade there is no data available for women participants.
-There is no trend shown in maximum weights lifted over the time in men or women.
-In the Recent decades there is an increase in data available.
### Visual 3
```{r Shruti Visuals}
ipf <- ipf_lifts %>%
mutate(year = as.numeric(format(date, '%Y'))) %>%
select(-event, -division, -federation, -date) %>%
gather(activity, weight, best3squat_kg:best3deadlift_kg) %>%
group_by(year)
max <- ipf %>%
group_by(sex, activity, year) %>%
filter(place != "DD" & place != "DQ" & !is.na(weight) & weight == max(weight, na.rm = T),
year >= 1980) %>%
ungroup() %>%
group_by (name) %>%
mutate(activity = recode(activity, "best3bench_kg" = "Bench",
"best3deadlift_kg" = "Deadlift", "best3squat_kg" = "Squat"),
sex = recode(sex, "F" = "Female", 'M' = "Male")) %>%
arrange(desc(year))
competition <- max %>%
group_by(sex, name) %>%
summarise(n = n()) %>%
group_by(sex, n) %>%
summarise(total = n())
#number of max achieved
g_count <- ggplot(competition, aes(x = n, y= total, fill = sex)) +
geom_bar(stat = "identity")+
facet_grid(sex ~ .)+
scale_x_continuous(breaks = seq(0,12,1))+
scale_y_continuous(breaks = seq(0,50,5))+
labs(y= "# of participants reaching the max", x = "Number of max achieved",
title = "How do the max achievements distribute across participants?",
subtitle = "",
caption = "")+
theme(
strip.text = element_blank(),
plot.title = element_text(size = 12),
legend.title = element_blank(),
legend.position='top',
legend.background = element_rect(fill = "transparent"),
legend.box.background = element_rect(fill = "transparent"),
legend.spacing.x = unit(0.4, 'cm'),
legend.text = element_text(size = 12),
)
g_count
```
***
- Comments of Visual 3
Conclusion {.storyboard}
=========================================
### THANK YOU!!
```{r conclusion-01,fig.height=5, fig.width=8}
library(jpeg)
pg <- readJPEG("img/ThankYou.jpg")
plot.new()
rasterImage(pg,0,0,1,1)
```
***
Our GitHub repo: https://github.com/RamyaPrakashPT/DataVisualization-DesignContest
Our Rpubs link: http://rpubs.com/soumyadipmitra/designcontest
Our BlogPost links: